ENV 859 - Fall 2024
© John Fay, Duke University
In this notebook, we examine how GeoPandas is used in peforming a spatial analysis. We take an example of looking at the demographic characteristics of where electric vehicle (EV) charging stations have been located in Durham, Wake, and Orange counties with respect to 2010 Census and Social Vulnerability Index values. (It's not a very sensible analysis as done here, but we are concentrating on the mechanics more than the utility of the analysis...)
Here we'll gather and explore the data we'll be using in our analysis. This includes two datasets. First is the list of EV Charging locations, stored as a CSV file in our data folder. This dataset has coordinate columns that we'll use to construct points and convert into a geodataframe as we learned in our previous lessons.
The second dataset is comprised of 2010 Census BlockGroup data for all of North Carolina. We'll fetch these data from an on line resource using a web service. We'll revisit how web services later; for now, we'll use this process to fetch data for three counties: Durham, Wake, and Orange.
For each dataset, we'll get the data into geodataframe format and then explore the data in various ways. Then we'll move to Part 2 where we analyse the data.
! pip install contextily
#Import packages
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as ctx
Collecting contextily Downloading contextily-1.6.2-py3-none-any.whl.metadata (2.9 kB) Collecting geopy (from contextily) Downloading geopy-2.4.1-py3-none-any.whl.metadata (6.8 kB) Requirement already satisfied: matplotlib in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from contextily) (3.6.3) Collecting mercantile (from contextily) Downloading mercantile-1.2.1-py3-none-any.whl.metadata (4.8 kB) Requirement already satisfied: pillow in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from contextily) (10.2.0) Collecting rasterio (from contextily) Downloading rasterio-1.4.2-cp311-cp311-win_amd64.whl.metadata (9.4 kB) Requirement already satisfied: requests in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from contextily) (2.31.0) Collecting joblib (from contextily) Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB) Collecting xyzservices (from contextily) Downloading xyzservices-2024.9.0-py3-none-any.whl.metadata (4.1 kB) Collecting geographiclib<3,>=1.52 (from geopy->contextily) Downloading geographiclib-2.0-py3-none-any.whl.metadata (1.4 kB) Requirement already satisfied: contourpy>=1.0.1 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from matplotlib->contextily) (1.2.0) Requirement already satisfied: cycler>=0.10 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from matplotlib->contextily) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from matplotlib->contextily) (4.25.0) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from matplotlib->contextily) (1.4.4) Requirement already satisfied: numpy>=1.19 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from matplotlib->contextily) (1.24.3) Requirement already satisfied: packaging>=20.0 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from matplotlib->contextily) (23.2) Requirement already satisfied: pyparsing>=2.2.1 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from matplotlib->contextily) (3.0.9) Requirement already satisfied: python-dateutil>=2.7 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from matplotlib->contextily) (2.8.2) Requirement already satisfied: click>=3.0 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from mercantile->contextily) (8.1.7) Collecting affine (from rasterio->contextily) Downloading affine-2.4.0-py3-none-any.whl.metadata (4.0 kB) Requirement already satisfied: attrs in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from rasterio->contextily) (23.1.0) Requirement already satisfied: certifi in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from rasterio->contextily) (2024.8.30) Requirement already satisfied: cligj>=0.5 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from rasterio->contextily) (0.7.2) Requirement already satisfied: click-plugins in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from rasterio->contextily) (1.1.1) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from requests->contextily) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from requests->contextily) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from requests->contextily) (2.1.0) Requirement already satisfied: colorama in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from click>=3.0->mercantile->contextily) (0.4.6) Requirement already satisfied: six>=1.5 in c:\users\erk25\appdata\local\esri\conda\envs\gis\lib\site-packages (from python-dateutil>=2.7->matplotlib->contextily) (1.16.0) Downloading contextily-1.6.2-py3-none-any.whl (17 kB) Downloading geopy-2.4.1-py3-none-any.whl (125 kB) ---------------------------------------- 125.4/125.4 kB 2.5 MB/s eta 0:00:00 Downloading joblib-1.4.2-py3-none-any.whl (301 kB) --------------------------------------- 301.8/301.8 kB 18.2 MB/s eta 0:00:00 Downloading mercantile-1.2.1-py3-none-any.whl (14 kB) Downloading rasterio-1.4.2-cp311-cp311-win_amd64.whl (25.4 MB) ---------------------------------------- 25.4/25.4 MB 24.2 MB/s eta 0:00:00 Downloading xyzservices-2024.9.0-py3-none-any.whl (85 kB) ---------------------------------------- 85.1/85.1 kB ? eta 0:00:00 Downloading geographiclib-2.0-py3-none-any.whl (40 kB) ---------------------------------------- 40.3/40.3 kB ? eta 0:00:00 Downloading affine-2.4.0-py3-none-any.whl (15 kB) Installing collected packages: xyzservices, joblib, geographiclib, affine, mercantile, geopy, rasterio, contextily Successfully installed affine-2.4.0 contextily-1.6.2 geographiclib-2.0 geopy-2.4.1 joblib-1.4.2 mercantile-1.2.1 rasterio-1.4.2 xyzservices-2024.9.0
#Command to run if autocomplete is slooooowwww
%config Completer.use_jedi = False
→ Can you explain what role each package imported might do in our analysis?
As done in a previous notebook, we want to:
#Read in charging stations CSV, convert to geodataframe
df = pd.read_csv('../data/NC_Charging_Stations.csv')
#Create the geoseries from the X and Y columns
geom = gpd.points_from_xy(x=df['Longitude'], y=df['Latitude'])
#Convert to a geodataframe, setting the CRS to WGS 84 (wkid=4326)
gdf_stations_all = gpd.GeoDataFrame(df,geometry = geom, crs=4326)
Have a quick look at the contents imported. Things to check include:
#Sow the number rows and columns
gdf_stations_all.shape
(1227, 11)
#Examine the structure of the dataframe
gdf_stations_all.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 1227 entries, 0 to 1226 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 1227 non-null int64 1 Fuel Type Code 1227 non-null object 2 Station Name 1227 non-null object 3 City 1227 non-null object 4 State 1227 non-null object 5 ZIP 1227 non-null object 6 Status Code 1227 non-null object 7 Latitude 1227 non-null float64 8 Longitude 1227 non-null float64 9 Facility Type 450 non-null object 10 geometry 1227 non-null geometry dtypes: float64(2), geometry(1), int64(1), object(7) memory usage: 105.6+ KB
#Examine a single record of data
gdf_stations_all.iloc[0
]
ID 39016 Fuel Type Code ELEC Station Name City of Raleigh - Municipal Building City Raleigh State NC ZIP 27601 Status Code E Latitude 35.778416 Longitude -78.64347 Facility Type STREET_PARKING geometry POINT (-78.64347 35.778416) Name: 0, dtype: object
#Reveal the geometry type(s) contained inthe geodataframae
gdf_stations_all['geometry'].type.unique()
array(['Point'], dtype=object)
#Plot the data
gdf_stations_all.plot().grid()
→ Could you performm the same steps of importing and exploring the USGS gage locations in NC stored in the CSV file ../data/gages.csv?
gdf_gagesWe will explore web services a bit later, but we'll use the code below to acquire polygon data of census block groups for Durham, Wake, and Orange counties from an NC OneMap Web Service. Once imported, we'll merge these geodataframes together and use them in our subsequet analyses.
(We'll examine how this works a bit later...)
#Create a function to read NCOneMap feature services into a geodataframe
def getBlockGroupData(FIPS):
#Construct the url from the function arguments
url=f'https://services.nconemap.gov/secure/rest/services/NC1Map_Census/FeatureServer/8/query?' + \
f"where=GEOID10+LIKE+'{FIPS}%'&outFields=GEOID10,TOTAL_POP&f=geojson"
#Create a geodataframe from the URL
gdf = gpd.read_file(url)
#Return the geodataframe
return gdf
#Fetch census block groups for Durham, Orange, and Wake counties using the above function
gdf_DurmBlkGroups = getBlockGroupData(37063)
gdf_WakeBlkGroups = getBlockGroupData(37183)
gdf_OrangeBlkGroups = getBlockGroupData(37135)
#Challenge: See if you can fetch Chatham county block groups (FIPS = 37037)
gdf_Chatam = getBlockGroupData(37037)
#Show the Durham block group geodataframe's coordinate reference system
gdf_DurmBlkGroups.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
#Explore the Durham block group geodataframe's columns...
gdf_DurmBlkGroups.columns
Index(['geoid10', 'total_pop', 'geometry'], dtype='object')
#Examine a sample record from the geodataframe
gdf_DurmBlkGroups.iloc[10]
geoid10 370630007002 total_pop 1030 geometry POLYGON ((-78.9175097042526 35.97670311102937,... Name: 10, dtype: object
#Plot Durham's population, setting the colormap to "viridis"
gdf_DurmBlkGroups.plot(column = 'total_pop', cmap = 'viridis')
<AxesSubplot: >
#Plot the block groups for all three counties
#First, plot one geodataframe, saving the output as "thePlot"
thePlot = gdf_DurmBlkGroups.plot(color = 'Blue')
#Plot another geodataframe, telling it to use the axes in "thePlot" created above
gdf_WakeBlkGroups.plot(ax=thePlot, color = 'Red')
gdf_OrangeBlkGroups.plot(ax=thePlot, color = 'LightBlue')
#Repeat...
<AxesSubplot: >
Now that we've obtained a few datasets and got them into geodataframes, we'll perform some analysis. These include:
Doc: https://geopandas.org/indexing.html
Subsetting features in a geodataframe uses the same methods as subsetting recordsin a Pandas dataframe. Here we'll run through an example by subsetting EV stations found oly within Durham, Raleigh, and Chapel Hill.
City column#Reveal the unique values in the City column
gdf_stations_all['City'].unique()
array(['Raleigh', 'Concord', 'Fayetteville', 'High Point', 'Lumberton',
'Durham', 'Charlotte', 'Sanford', 'Burlington', 'Asheville',
'Forest City', 'Gastonia', 'Goldsboro', 'Greensboro', 'Greenville',
'Hendersonville', 'Hickory', 'Jacksonville', 'Montreat',
'New Bern', 'Reidsville', 'Roanoke Rapids', 'Rockingham',
'Mt. Airy', 'Salisbury', 'Southern Pines', 'Statesville',
'Wake Forest', 'Wilkesboro', 'Wilmington', 'Wilson',
'Winston-Salem', 'Clyde', 'Cornelius', 'Elizabeth City',
'Asheboro', 'Black Mountain', 'Boone', 'Cary', 'Chapel Hill',
'Hillsborough', 'Cherokee', 'Flat Rock', 'Clinton', 'Knightdale',
'Mt. Holly', 'Mooresville', 'Weaverville', 'Fletcher', 'Pittsboro',
'Belmont', 'Lowell', 'Dallas', 'Lincolnton', 'Arden',
'Kernsersville', 'Mebane', 'Point Harbor', 'Nags Head',
'Huntersville', 'Ocean Isle Beach', 'Holden Beach', 'Lenoir',
'Waynesville', 'Maggie Valley', 'Granite Falls', 'Hot Springs',
'West Jefferson', 'Cullowhee', 'Graham', 'Indian Trail',
'Kings Mountain', 'Old Fort', 'Henderson', 'Garner', 'Roxboro',
'Castle Hayne', 'Star', 'Carrboro', 'Brevard', 'Archdale', 'Sylva',
'Dillsboro', 'Kinston', 'Oriental', 'Monroe', 'Pembroke',
'Lake Lure', 'Calabash', 'Shallotte', 'Research Triangle Park',
'Cherokee,', 'Gibsonville', 'Elkin', 'Dobson', 'Davidson',
'Franklin', 'Rocky Mount', 'Wallace', 'Hatteras', 'Mills River',
'Banner Elk', 'Beaufort', 'Benson', 'Blowing Rock', 'Buxton',
'Cape Carteret', 'Cashiers', 'Duck', 'Elon', 'Highlands',
'Kannapolis', 'Kill Devil Hills', 'Manteo', 'Murphy', 'Newton',
'Pinehurst', 'Robbinsville', 'Saluda', 'Selma', 'Southport',
'Tryon', 'Whitsett', 'Lexington', 'Shelby', 'Burgaw', 'Matthews',
'Plymouth', 'Madison', 'Avon', 'Fuquay Varina', 'Moncure',
'Mt Airy', 'King', 'Smithfield', 'Dunn', 'Lilesville', 'Raeford',
'Snow Hill', 'Evergreen', 'Edenton', 'Dudley', 'Sneads Ferry',
'Aulander', 'Mcleansville', 'Lillington', 'Sunset Beach',
'Bessemer City', 'Leland', 'Conover', 'Emerald Isle', 'Swansboro',
'Aberdeen', 'Marion', 'Enfield', 'Nashville', 'Warrenton',
'Powells Point', 'Hamptonville', 'Winterville', 'Linville', 'Apex',
'Cherry Point', 'Morrisville', 'Butner', 'SOUTHPORT', 'Haw River',
'Colfax', 'Halifax', 'Mocksville', 'Albemarle', 'Sparta',
'Troutman', 'Hampstead', 'Siler City', 'Tarboro', 'Sugar Grove',
'Cramerton', 'Hope Mills', 'Kitty Hawk', 'Robbins', 'Harrisburg',
'Spindale', 'Yadkinville', 'Pilot Mountain', 'Ellerbe',
'Burnsville', 'Scotland Neck', 'Rose Hill', 'Elizabethtown',
'Warsaw', 'Waxhaw', 'Pisgah Forest', 'Sugar Mountain', 'Littleton',
'Mars Hill', 'Marshville', 'Holly Springs', 'Louisburg',
'Oak Island', 'Washington', 'Bakersville', 'Windsor', 'Candler',
'Rich Square', 'Thomasville', 'Jackson', 'Troy', 'Morehead City',
'Kernersville', 'Oxford', 'Corolla', 'Rolesville', 'Whiteville',
'Jamestown', 'Mint Hill'], dtype=object)
#Subset records where the City is "Durham", "Raleigh", or "Chapel Hill"
gdf_stations = gdf_stations_all.query('City in ("Raleigh", "Durham", "Chapel Hill")')
gdf_stations.shape
(205, 11)
#Recall, an alternative is to use masks...
gdf_stations = gdf_stations_all.loc[
(gdf_stations_all['City'] == 'Raleigh') |
(gdf_stations_all['City'] == 'Durham') |
(gdf_stations_all['City'] == 'Chapel Hill')
]
gdf_stations.shape
(205, 11)
#Another, more efficient mask using `isin`
gdf_stations = gdf_stations_all.loc[gdf_stations_all['City'].isin(("Raleigh", "Durham", "Chapel Hill"))]
gdf_stations.shape
(205, 11)
#Plot the results
myPlot = gdf_DurmBlkGroups.plot()
gdf_stations.plot("City", ax = myPlot);
#Plot them with a base map (using Contextily - more later...)
city_plot = gdf_stations.to_crs(3857).plot(column="City",figsize=(10,10),alpha=0.6)
ctx.add_basemap(city_plot)
Doc: https://geopandas.org/mergingdata.html
append() functionWe'll start by appending the Wake Co. dataset to the Durham Co. one. Then you will append the Orange Co. dataframe to that product.
→If the were different, we could use to_crs() to transform one dataset to use the crs of the other
#Check the crs of the two geodataframes
#Add a field to each input, setting values to identify the source dataset
#Append the Wake Co features to the Durham Co features,
gdf_BlkGrp_step1 =
#Check to see that the total rows in the merged gdf match the sum of the two component gdfs
#Plot the result
Append the Orange Co blockgroup features to the gdf_BlkGrp_step geodataframe we just created.
Remember to:
gdf_OrangeBlkGroups, setting its value to the County name.→ Save the result as gdf_BlkGrps
#Check that the coordinate refernce systems are the same
gdf_DurmBlkGroups.crs == gdf_OrangeBlkGroups.crs
True
#Add the county field
gdf_DurmBlkGroups['County'] = 'Durham'
gdf_OrangeBlkGroups['County'] = 'Orange'
gdf_WakeBlkGroups['County'] = 'Wake'
#Append the geodataframes
gdf_BlkGrp_step1 = pd.concat([gdf_DurmBlkGroups,gdf_WakeBlkGroups])
gdf_BlkGrps = pd.concat([gdf_BlkGrp_step1, gdf_OrangeBlkGroups])
#Plot the output
gdf_BlkGrps.plot(column = 'County')
<AxesSubplot: >
We have Social Vulnerability Data to examine in our analysis, but these data are at the Tract, not BlockGroup level. Thus, to join these attributes to our geodataframe, we'll need to aggregate our blockgroups to the tract level. Fortunately, the GEOID10 attribute is structured such that the census tract is just the first 11 characters. So we will create a new column holding these first 11 characters, and then we'll dissolve our blockgroup features sharing the same tract ID to single features.
Doc: https://geopandas.org/aggregation_with_dissolve.html
#Create the Tract column
gdf_BlkGrps['TRACT']= gdf_BlkGrps['geoid10'].str[0:11]
gdf_BlkGrps.head()
| geoid10 | total_pop | geometry | County | TRACT | |
|---|---|---|---|---|---|
| 0 | 370630003023 | 1533 | POLYGON ((-78.89598 36.00893, -78.89597 36.010... | Durham | 37063000302 |
| 1 | 370630003021 | 779 | POLYGON ((-78.89606 36.01699, -78.89708 36.017... | Durham | 37063000302 |
| 2 | 370630003022 | 1114 | POLYGON ((-78.9018 36.01289, -78.90217 36.0129... | Durham | 37063000302 |
| 3 | 370630005003 | 1289 | POLYGON ((-78.92329 35.99672, -78.92398 35.995... | Durham | 37063000500 |
| 4 | 370630004011 | 898 | POLYGON ((-78.9361 36.02206, -78.93612 36.0216... | Durham | 37063000401 |
#Dissolve features on tract, computing summed population
gdf_Tract = gdf_BlkGrps.dissolve(
'TRACT',
aggfunc = {
'total_pop': 'sum',
'County': 'first'
})
#Show the results
gdf_Tract.head()
| geometry | total_pop | County | |
|---|---|---|---|
| TRACT | |||
| 37063000101 | POLYGON ((-78.87411 36.02349, -78.87417 36.023... | 3152 | Durham |
| 37063000102 | POLYGON ((-78.89264 36.01976, -78.89268 36.018... | 4535 | Durham |
| 37063000200 | POLYGON ((-78.88339 36.00374, -78.88394 36.003... | 2946 | Durham |
| 37063000301 | POLYGON ((-78.91097 36.0123, -78.91102 36.0114... | 2504 | Durham |
| 37063000302 | POLYGON ((-78.89598 36.00893, -78.89632 36.008... | 3426 | Durham |
#Plot the data
gdf_Tract.plot(column = 'total_pop')
<AxesSubplot: >
Now that we have the data at the tract level, we can join the Social Vulnerability Index data, stored in a CSV file (./data/NC_SVI_2018.csv).
Doc: https://geopandas.org/mergingdata.html#attribute-joins
#Import and explore the SVI data
df_SVI = pd.read_csv('../data/NC_SVI_2018.csv', dtype= {'FIPS': 'str', 'ST':'str','STCNTY': 'str'})
df_SVI.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2192 entries, 0 to 2191 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ST 2192 non-null object 1 STATE 2192 non-null object 2 STCNTY 2192 non-null object 3 COUNTY 2192 non-null object 4 FIPS 2192 non-null object 5 SVI 2192 non-null float64 dtypes: float64(1), object(5) memory usage: 102.9+ KB
Challenge:
→ Modify theread_csv()command above so that 'ST' and 'STCNTY' are also imported as strings.
#Plot a histogram of the SVI values
df_SVI['SVI'].hist()
<AxesSubplot: >
#Create a mask of values greater than or equal to zero
valid_mask = df_SVI['SVI'] >= 0
valid_mask
#Apply that mask
df_SVI_fixed = df_SVI.loc[valid_mask]
#View the histogram again
df_SVI_fixed.hist()
array([[<AxesSubplot: title={'center': 'SVI'}>]], dtype=object)
Phew! Exploring the data payed off!
#Have a look at the merge command syntax
gdf_Tract.merge?
#Join the SVI data to the tract features
gdf_Tract_joined = gdf_Tract.merge(
df_SVI_fixed,
left_on = 'TRACT',
right_on = 'FIPS',
how = 'left')
#Examine the output
gdf_Tract_joined.head()
| geometry | total_pop | County | ST | STATE | STCNTY | COUNTY | FIPS | SVI | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | POLYGON ((-78.87411 36.02349, -78.87417 36.023... | 3152 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000101 | 0.8080 |
| 1 | POLYGON ((-78.89264 36.01976, -78.89268 36.018... | 4535 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000102 | 0.8547 |
| 2 | POLYGON ((-78.88339 36.00374, -78.88394 36.003... | 2946 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000200 | 0.5706 |
| 3 | POLYGON ((-78.91097 36.0123, -78.91102 36.0114... | 2504 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000301 | 0.7048 |
| 4 | POLYGON ((-78.89598 36.00893, -78.89632 36.008... | 3426 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000302 | 0.3392 |
#Explore the output, look form null values
gdf_Tract_joined.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 275 entries, 0 to 274 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 geometry 275 non-null geometry 1 total_pop 275 non-null int32 2 County 275 non-null object 3 ST 270 non-null object 4 STATE 270 non-null object 5 STCNTY 270 non-null object 6 COUNTY 270 non-null object 7 FIPS 270 non-null object 8 SVI 270 non-null float64 dtypes: float64(1), geometry(1), int32(1), object(6) memory usage: 18.4+ KB
#Plot the output
gdf_Tract_joined.plot(column= 'SVI')
<AxesSubplot: >
Looks like we have some features missing SVI data. Let's examine those more closely.
#Create a mask of null SVI values
null_mask = gdf_Tract_joined['SVI'].isnull()
null_mask.sum()
5
#Apply the mask
gdf_Tract_joined.loc[null_mask]
| geometry | total_pop | County | ST | STATE | STCNTY | COUNTY | FIPS | SVI | |
|---|---|---|---|---|---|---|---|---|---|
| 20 | POLYGON ((-78.91595 36.00978, -78.91718 36.009... | 1941 | Durham | NaN | NaN | NaN | NaN | NaN | NaN |
| 59 | POLYGON ((-78.88307 35.88604, -78.88532 35.886... | 4 | Durham | NaN | NaN | NaN | NaN | NaN | NaN |
| 79 | POLYGON ((-79.04624 35.91339, -79.04606 35.913... | 2350 | Orange | NaN | NaN | NaN | NaN | NaN | NaN |
| 273 | POLYGON ((-78.80909 35.88187, -78.80944 35.881... | 2 | Wake | NaN | NaN | NaN | NaN | NaN | NaN |
| 274 | POLYGON ((-78.74827 35.87112, -78.74843 35.871... | 30 | Wake | NaN | NaN | NaN | NaN | NaN | NaN |
We can either assign a value to these missing values or leave them as no data. We'll just leave them blank for now...
Our combined dataframes have a field indicating the total population in each block group. We want to compute population density from this and from the area of each tract. We don't yet have an area field in our dataframe, but we can compute that from our spatial features. But before we can do this, we need to transform our data into a projected coordinate system. So... the steps for this analysis include:
Area_km2 column in our dataframePopDens column in our dataframe by dividing TOTAL_POP by Area_km#Project the data to UTM Zone 17N (EPSG 32617)
gdf_Tract_utm = gdf_Tract_joined.to_crs(32617)
gdf_Tract_utm.head()
| geometry | total_pop | County | ST | STATE | STCNTY | COUNTY | FIPS | SVI | PD_log | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | POLYGON ((691558.001 3988644.329, 691553.732 3... | 3152 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000101 | 0.8080 | 6.829620 |
| 1 | POLYGON ((689897.492 3988193.797, 689896.525 3... | 4535 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000102 | 0.8547 | 7.029405 |
| 2 | POLYGON ((690769.57 3986435.083, 690720.109 39... | 2946 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000200 | 0.5706 | 6.882403 |
| 3 | POLYGON ((688263.171 3987331.051, 688261.05 39... | 2504 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000301 | 0.7048 | 7.494074 |
| 4 | POLYGON ((689621.976 3986986.04, 689591.484 39... | 3426 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000302 | 0.3392 | 7.595497 |
Area_km2 column in our dataframe#Compute a new column of geometry area (in sq km)
gdf_Tract_utm['Area_km2'] = gdf_Tract_utm['geometry'].area / 1000000
PopDens column in our dataframe by dividing TOTAL_POP by Area_km#Compute a new column of population density
gdf_Tract_utm['PopDens'] = gdf_Tract_utm['total_pop'] / gdf_Tract_utm['Area_km2']
#Plot the distribution of areas
gdf_Tract_utm.hist(column = 'Area_km2')
array([[<AxesSubplot: title={'center': 'Area_km2'}>]], dtype=object)
#Plot a map of log tranformed population density
gdf_Tract_utm.plot(column = 'PopDens')
<AxesSubplot: >
#Log transform the pop_dens data
import numpy as np
gdf_Tract_utm['PD_log'] = np.log(gdf_Tract_utm['PopDens'])
#Plot the log-transformed distribution of areas
gdf_Tract_utm['PD_log'].hist()
<AxesSubplot: >
#Plot a map of log tranformed population density
gdf_Tract_utm.plot(column = 'PD_log')
<AxesSubplot: >
Doc: https://geopandas.org/set_operations.html
Previously, we subset EV stations by an attribute (City). Here we'll see how we can instead select features spatially. We do this with GeoPanda's Overlay operations.
To spatially select features:
overlay:intersection operation to select EV features overlap with the Tracts dataset#Ensure both datasets share the same crs
gdf_Tract_utm.crs.to_string(), gdf_stations_all.crs.to_string()
#Project one dataset to match the other
gdf_stations_all_utm = gdf_stations_all.to_crs(gdf_Tract_utm.crs)
#plot points on tracts
myplot = gdf_Tract_utm.plot(figsize = (12,6),
alpha = 0.6)
gdf_stations_all_utm.plot(ax = myplot)
<AxesSubplot: >
#Intersect the two dataframes
gdf_stations_select = gpd.overlay(
df1 = gdf_stations_all_utm,
df2 = gdf_Tract_utm,
how = 'intersection')
#View the data
gdf_stations_select.head()
| ID | Fuel Type Code | Station Name | City | State | ZIP | Status Code | Latitude | Longitude | Facility Type | total_pop | County | ST | STATE | STCNTY | COUNTY | FIPS | SVI | PD_log | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 39016 | ELEC | City of Raleigh - Municipal Building | Raleigh | NC | 27601 | E | 35.778416 | -78.643470 | STREET_PARKING | 2823 | Wake | 37 | NORTH CAROLINA | 37183 | Wake | 37183050100 | 0.3077 | 7.182326 | POINT (713000.165 3961933.778) |
| 1 | 39017 | ELEC | City of Raleigh - Downtown | Raleigh | NC | 27601 | E | 35.774350 | -78.642287 | STREET_PARKING | 2823 | Wake | 37 | NORTH CAROLINA | 37183 | Wake | 37183050100 | 0.3077 | 7.182326 | POINT (713117.97 3961485.268) |
| 2 | 40751 | ELEC | City of Raleigh - Wilmington Station Deck | Raleigh | NC | 27601 | E | 35.779370 | -78.638203 | PAY_GARAGE | 2823 | Wake | 37 | NORTH CAROLINA | 37183 | Wake | 37183050100 | 0.3077 | 7.182326 | POINT (713473.768 3962051.084) |
| 3 | 42396 | ELEC | Raleigh Municipal Building Deck | Raleigh | NC | 27601 | E | 35.779082 | -78.641779 | MUNI_GOV | 2823 | Wake | 37 | NORTH CAROLINA | 37183 | Wake | 37183050100 | 0.3077 | 7.182326 | POINT (713151.258 3962011.344) |
| 4 | 42397 | ELEC | City of Raleigh | Raleigh | NC | 27601 | T | 35.777283 | -78.639281 | STREET_PARKING | 2823 | Wake | 37 | NORTH CAROLINA | 37183 | Wake | 37183050100 | 0.3077 | 7.182326 | POINT (713381.895 3961817.202) |
#Plot the data
#Construct the first (lowest layer) of the data to plot, saving it as "the_plot"
the_plot = gdf_Tract_utm.plot(
color='lightgrey', #Set the fill of the Tract features
edgecolor='grey', #Set the edge color...
alpha=0.4, #Set the transparency...
figsize=(12,12)) #Set the size of the figure
#Plot the stations, setting them to share the axes of "the_plot"
gdf_stations_select.plot(
ax=the_plot, #Draw it on the plot created above
column='SVI', #Color the features by values in the City column
markersize=45, #Set the size of the markers
edgecolor='white'); #Set the edge color of the markers
#Use Contextily to add a nice backdrop
ctx.add_basemap(
ax = the_plot, #Add it to our existing plot
crs=gdf_Tract_utm.crs, #Transform the background to match the data's crs
source=ctx.providers.CartoDB.Voyager) #Set the type of backdrop
Doc: https://geopandas.org/mergingdata.html#spatial-joins
Now that we have a proper susbset of EV stations, let's add demographic data to our EV locations by peforming a spatial join with the tract geodataframe.
→ What would happen if we made a typo and only included a single equals sign??
#Compare crs
gdf_stations_select.crs == gdf_Tract_utm.crs
True
#Execute the spatial join
gdf_JoinedData = gpd.sjoin(
left_df = gdf_stations_select,
right_df = gdf_Tract_utm,
how = 'left',
lsuffix = 'ev',
rsuffix = 'tracts')
#View the data
gdf_JoinedData.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Index: 313 entries, 0 to 312 Data columns (total 30 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 313 non-null int64 1 Fuel Type Code 313 non-null object 2 Station Name 313 non-null object 3 City 313 non-null object 4 State 313 non-null object 5 ZIP 313 non-null object 6 Status Code 313 non-null object 7 Latitude 313 non-null float64 8 Longitude 313 non-null float64 9 Facility Type 106 non-null object 10 total_pop_ev 313 non-null int32 11 County_ev 313 non-null object 12 ST_ev 305 non-null object 13 STATE_ev 305 non-null object 14 STCNTY_ev 305 non-null object 15 COUNTY_ev 305 non-null object 16 FIPS_ev 305 non-null object 17 SVI_ev 305 non-null float64 18 PD_log_ev 313 non-null float64 19 geometry 313 non-null geometry 20 index_tracts 313 non-null int64 21 total_pop_tracts 313 non-null int32 22 County_tracts 313 non-null object 23 ST_tracts 305 non-null object 24 STATE_tracts 305 non-null object 25 STCNTY_tracts 305 non-null object 26 COUNTY_tracts 305 non-null object 27 FIPS_tracts 305 non-null object 28 SVI_tracts 305 non-null float64 29 PD_log_tracts 313 non-null float64 dtypes: float64(6), geometry(1), int32(2), int64(2), object(19) memory usage: 73.4+ KB
#Plot
the_plot = gdf_Tract_utm.plot(
color='lightgrey',
edgecolor='grey',
alpha=0.4,
figsize=(12,12))
gdf_JoinedData.plot(
ax=the_plot,
column='SVI_ev',
markersize=60,
edgecolor='white');
ctx.add_basemap(
ax=the_plot,
crs=gdf_Tract_utm.crs,
source=ctx.providers.CartoDB.Voyager)
#Make some graphical plots
ax=pd.DataFrame(gdf_JoinedData).boxplot(
column='SVI_ev',
by='City',
rot=45,
figsize=(19,5));
With this document we now have a fully reproducible analytic workflow, complete with visualizations of our result. We can export this notebook as an HTML document and share that, or if we commit this document to our GitHub account and share the link to that notebook.
3.1.1
Export your completed notebook as an HTML document.
View it in a web browser
3.1.2
Commit the changes to your forked repository
Navigate to https://nbviewer.jupyter.org/ and paste your repository's URL
Share this link with others...
We can also save the resulting geodataframes as feature classes for more analysis, either in Python or in ArcGIS Pro.
We can export our gdf_JoinedData geodataframe easily,either as a shapefile or a CSV file.
#Export the geodataframe to a shapefile
gdf_JoinedData.to_file(
filename='../data/EV_sites_with_data.shp',
driver='ESRI Shapefile',
index=False
)
C:\Users\erk25\AppData\Local\Temp\ipykernel_7656\3809339296.py:2: UserWarning: Column names longer than 10 characters will be truncated when saved to ESRI Shapefile. gdf_JoinedData.to_file( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'Fuel Type Code' to 'Fuel Type' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'Station Name' to 'Station Na' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'Status Code' to 'Status Cod' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'Facility Type' to 'Facility T' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'total_pop_ev' to 'total_pop_' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'COUNTY_ev' to 'COUNTY_e_1' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'index_tracts' to 'index_trac' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'total_pop_tracts' to 'total_po_1' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'County_tracts' to 'County_tra' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'STATE_tracts' to 'STATE_trac' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'STCNTY_tracts' to 'STCNTY_tra' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'COUNTY_tracts' to 'COUNTY_t_1' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'FIPS_tracts' to 'FIPS_tract' ogr_write( C:\Users\erk25\AppData\Roaming\Python\Python311\site-packages\pyogrio\raw.py:723: RuntimeWarning: Normalized/laundered field name: 'PD_log_tracts' to 'PD_log_tra' ogr_write(
#Export the geodataframe to a CSV file
pd.DataFrame(gdf_JoinedData).to_csv(
'../data/my_saved_data.csv',
index=False
)